20 research outputs found

    RePP-C: runtime estimation of performance-power with workload consolidation in CMPs

    Get PDF
    Configuration of hardware knobs in multicore environments for meeting performance-power demands constitutes a desirable feature in modern data centers. At the same time, high energy efficiency (performance per watt) requires optimal thread-to-core assignment. In this paper, we present the runtime estimator (RePP-C) for performance-power, characterized by processor frequency states (P-states), a wide range of sleep intervals (Cl-states) and workload consolidation. We also present a schema for frequency and contention-aware thread-to-core assignment (FACTS) which considers various thread demands. The proposed solution (RePP-C) selects a given hardware configuration for each active core to ensure that the performance-power demands are satisfied while using the scheduling schema (FACTS) for mapping threads-to-cores. Our results show that FACTS improves over other state-of-the-art schedulers like Distributed Intensity Online (DIO) and native Linux scheduler by 8.25% and 37.56% in performance, with simultaneous improvement in energy efficiency by 6.2% and 14.17%, respectively. Moreover, we prove the usability of RePP-C by predicting performance and power for 7 different types of workloads and 10 different QoS targets. The results show an average error of 7.55% and 8.96% (with 95% confidence interval) when predicting energy and performance respectively.This work has been partially supported by the European Union FP7 program through the Mont-Blanc-2 project (FP7-ICT-610402), by the Ministerio de Economia y Competitividad under contract Computacion de Altas Prestaciones VII (TIN2015-65316-P), and the Departament d’Innovacio, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programacio i Entorns d’Execucio Paral.lels (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    REPP-H: runtime estimation of power and performance on heterogeneous data centers

    Get PDF
    Modern data centers increasingly demand improved performance with minimal power consumption. Managing the power and performance requirements of the applications is challenging because these data centers, incidentally or intentionally, have to deal with server architecture heterogeneity [19], [22]. One critical challenge that data centers have to face is how to manage system power and performance given the different application behavior across multiple different architectures.This work has been supported by the EU FP7 program (Mont-Blanc 2, ICT-610402), by the Ministerio de Economia (CAP-VII, TIN2015-65316-P), and the Generalitat de Catalunya (MPEXPAR, 2014-SGR-1051). The material herein is based in part upon work supported by the US NSF, grant numbers ACI-1535232 and CNS-1305220.Peer ReviewedPostprint (author's final draft

    Inception: we need to go wider

    Get PDF

    Runtime estimation of performance–power in CMPs under QoS constraints

    Get PDF
    One of the main challenges in data center systems is operating under certain Quality of Service (QoS) while minimizing power consumption. Increasingly, data centers are exploring and adopting heterogeneous server architectures with different power and performance trade-offs. This not only requires careful understanding of the application behavior across multiple architectures at runtime so as to enable meeting power and performance requirements but also an understanding of individual and aggregated behaviour of application and server level performance and power metrics

    Energy optimizing methodologies on heterogeneous data centers

    Get PDF
    In 2013, U.S. data centers accounted for 2.2% of the country’s total electricity consumption, a figure that is projected to increase rapidly over the next decade. Many important work-loads are interactive, and they demand strict levels of quality-of-service (QoS) to meet user expectations, making it challenging to reduce power consumption due to increasing performance demands

    Hipster: hybrid task manager for latency-critical cloud workloads

    Get PDF
    In 2013, U. S. data centers accounted for 2.2% of the country's total electricity consumption, a figure that is projected to increase rapidly over the next decade. Many important workloads are interactive, and they demand strict levels of quality-of-service (QoS) to meet user expectations, making it challenging to reduce power consumption due to increasing performance demands. This paper introduces Hipster, a technique that combines heuristics and reinforcement learning to manage latency-critical workloads. Hipster's goal is to improve resource efficiency in data centers while respecting the QoS of the latency-critical workloads. Hipster achieves its goal by exploring heterogeneous multi-cores and dynamic voltage and frequency scaling (DVFS). To improve data center utilization and make best usage of the available resources, Hipster can dynamically assign remaining cores to batch workloads without violating the QoS constraints for the latency-critical workloads. We perform experiments using a 64-bit ARM big.LITTLE platform, and show that, compared to prior work, Hipster improves the QoS guarantee for Web-Search from 80% to 96%, and for Memcached from 92% to 99%, while reducing the energy consumption by up to 18%.Peer ReviewedPostprint (author's final draft

    The Hipster Approach for Improving Cloud System Efficiency

    Get PDF
    In 2013, U.S. data centers accounted for 2.2% of the country’s total electricity consumption, a figure that is projected to increase rapidly over the next decade. Many important data center workloads in cloud computing are interactive, and they demand strict levels of quality-of-service (QoS) to meet user expectations, making it challenging to optimize power consumption along with increasing performance demands. This article introduces Hipster, a technique that combines heuristics and reinforcement learning to improve resource efficiency in cloud systems. Hipster explores heterogeneous multi-cores and dynamic voltage and frequency scaling for reducing energy consumption while managing the QoS of the latency-critical workloads. To improve data center utilization and make best usage of the available resources, Hipster can dynamically assign remaining cores to batch workloads without violating the QoS constraints for the latency-critical workloads. We perform experiments using a 64-bit ARM big.LITTLE platform and show that, compared to prior work, Hipster improves the QoS guarantee for Web-Search from 80% to 96%, and for Memcached from 92% to 99%, while reducing the energy consumption by up to 18%. Hipster is also effective in learning and adapting automatically to specific requirements of new incoming workloads just enough to meet the QoS and optimize resource consumption.This work has been partially supported by the European Union FP7 program through the Mont-Blanc-3 (FP7-ICT-671697) and EUROSERVER (FP7-ICT-610456) projects, by the Ministerio de Economia y Competitividad under contract Computación de Altas Prestaciones VII (TIN2015- 65316-P), and the Departament de Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d Execució Paral lels (2014-SGR-1051). Prior Publication: Rajiv Nishtala, Paul Carpenter, Vinicius Petrucci and Xavier Martorell. Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads. In Proceedings of the 23rd High Performance and Computer Architecture (HPCA 2017). In this work, we extend our previous work in several ways. First, we present an analysis of the size of the reward lookup table and an optimization for the table to improve the scalability of our reinforcement learning mechanism. Second, we demonstrate Hipster’s capability to adapt to changes in the latency-critical application at runtime and still satisfy QoS guarantees of the new incoming applications. Lastly, we present a deployment methodology for setting up new applications managed by Hipster’s runtime system. Author’s addresses: Rajiv Nishtala and Xavier Martorell, Universitat Politècnica de Catalunya and Barcelona Supercomputing Center; Paul Carpenter, Barcelona Supercomputing Center; Vincius Petrucci, Federal University of Bahia, Salvador, Brazil. emails:{rajiv.nishtala, paul.carpenter, xavier.martorell}@bsc.es; email: [email protected] . ACM acknowledges that this contribution was authored or co-authored by an employee, or contractor of the national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. Permission to make digital or hard copies for personal or classroom use is granted. Copies must bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. To copy otherwise, distribute, republish, or post, requires prior speci c permission and/or a fee. Request permissions from [email protected] ReviewedPostprint (author's final draft

    Energy optimising methodologies on heterogeneous data centres

    Get PDF
    In 2013, U.S. data centres accounted for 2.2% of the country's total electricity consumption, a figure that is projected to increase rapidly over the next decade. A significant proportion of power consumed within a data centre is attributed to the servers, and a large percentage of that is wasted as workloads compete for shared resources. Many data centres host interactive workloads (e.g., web search or e-commerce), for which it is critical to meet user expectations and user experience, called Quality of Service (QoS). There is also a wish to run both interactive and batch workloads on the same infrastructure to increase cluster utilisation and reduce operational costs and total energy consumption. Although much work has focused on the impacts of shared resource contention, it still remains a major problem to maintain QoS for both interactive and batch workloads. The goal of this thesis is twofold. First, to investigate how, and to what extent, resource contention has an effect on throughput and power of batch workloads via modelling. Second, we introduce a scheduling approach to determine on-the-fly the best configuration to satisfy the QoS for latency-critical jobs on any architecture. To achieve the above goals, we first propose a modelling technique to estimate server performance and power at runtime called Runtime Estimation of Performance and Power (REPP). REPP's goal is to allow administrators' control on power and performance of processors. REPP achieves this goal by estimating performance and power at multiple hardware settings (dynamic frequency and voltage states (DVFS), core consolidation and idle states) and dynamically sets these settings based on user-defined constraints. The hardware counters required to build the models are available across architectures, making it architecture agnostic. We also argue that traditional modelling and scheduling strategies are ineffective for interactive workloads. To manage such workloads, we propose Hipster that combines both a heuristic, and a reinforcement learning algorithm to manage interactive workloads. Hipster's goal is to improve resource efficiency while respecting the QoS of interactive workloads. Hipster achieves its goal by exploring the multicore system and DVFS. To improve utilisation and make the best usage of the available resources, Hipster can dynamically assign remaining cores to batch workloads without violating the QoS constraints for the interactive workloads. We implemented REPP and Hipster in real-life platforms, namely 64-bit commercial (Intel SandyBridge and AMD Phenom II X4 B97) and experimental hardware (ARM big.LITTLE Juno R1). After obtaining extensive experimental results, we have shown that REPP successfully estimates power and performance of several single-threaded and multiprogrammed workloads. The average errors on Intel, AMD and ARM architectures are, respectively, 7.1%, 9.0%, 7.1% when predicting performance, and 8.1%, 6.5%, 6.0% when predicting power. Similarly, we show that when compared to prior work, Hipster improves the QoS guarantee for Web-Search from 80% to 96%, and for Memcached from 92% to 99%, while reducing the energy consumption by up to 18% on the ARM architecture.En el año 2013, los centros de cálculo de los EEUU consumieron el 2,2% del consumo total de electricidad en ese país. Las proyecciones futuras indican que esta cantidad se incrementará rápidamente durante la próxima década. Una cantidad significativa del consumo de un centro de cálculo corresponde al funcionamiento de los servidores, y un alto porcentaje de este consumo se desperdicia mientras los trabajos compiten en el uso de recursos compartidos. Una gran cantidad de los centros de cálculo se utilizan para ejecutar trabajos interactivos, para los cuales es muy importante cumplir con las expectativas de los usuarios y proporcionar una alta calidad de servicio (CDS). En estos centros, se intentan ejecutar aplicaciones interactivas i en batch en la misma infraestructura para incrementar su utilización, y reducir los costes de mantenimiento y la energía total consumida. Aunque se dedican muchos esfuerzos al impacto de la compartición de recursos en el rendimiento de las aplicaciones, todavía se mantiene el problema de garantizar un determinado nivel de CDS para los dos tipos de trabajos, interactivos y en batch. Los objetivos de esta tesis doctoral son, enprimerlugar, investigar mediante técnicas de modelado, cómo y hasta que punto la contención debida a la compartición de recursos tiene un efecto en la ejecución y el consumo en trabajos batch. Ensegundolugar, la tesis presenta una técnica de planificación para determinar dinámicamente la mejor configuración para satisfacer una CDS en trabajos interactivos con un límite de latencia preestablecido, en cualquier arquitectura Para conseguir los objetivos propuestos, primero proponemos una técnica de modelización para estimar dinámicamente el rendimiento y el consumo de los servidores, que recibe por nombre Runtime Estimation of Performance and Power (REPP). El objetivo que perseguimos con la política de planificación REPP es permitir a los administradores obtener el control del consumo y el rendimiento de los procesadores. REPP consigue este objetivo a través de la estimación del rendimiento de las aplicaciones y su consumo al variar los niveles de energía del procesador, y dinámicamente cambia la configuración del sistema respetando las condiciones dadas por el usuario. Este modelado se realiza en base a un conjunto de contadores de eventos del procesador, que se han seleccionado de forma que están disponibles en las arquitecturas más comunes, haciendo que REPP sea independiente de la arquitectura En este trabajo de tesis doctoral, también defendemos que los métodos tradicionales de modelado y las estrategias de planificación usadas en estos entornos, no son efectivas para trabajos interactivos. Para tratar correctamente a estos trabajos, proponemos Hipster, una política de planificación que combina una heurística y un algoritmo basado en aprendizaje por refuerzo. El objetivo que fijamos con Hipster es mejorar la eficiencia en el uso de los recursos, al mismo tiempo que se respeta la calidad de servicio data a los trabajos interactivos. Hipster consigue sus objetivos con la exploración del funcionamiento del sistema y la variación de la frecuencia y el voltaje de los procesadores Hemos implementado REPP y Hipster en plataformas comerciales de 64bit (Intel y AMD) y experimentales (ARM big.LITTLE). Hemos obtenido resultados experimentales en estas plataformas y hemos demostrado que REPP realiza estimaciones de consumo y rendimiento de aplicaciones secuenciales y de trabajos formados por varias aplicaciones. El error medio en las arquitecturas Intel, AMD y ARM son, respectivamente, del 7,1%,9,0% y 7,1% en la predicción del rendimiento, y del 8,1%,6,5% y 6,0% en la predicción del consumo. De forma similar, demostramos que al comparar Hipster con los trabajos previos, nuestro algoritmo mejora la calidad de servicio para el servicio de búsqueda en la web, entre el 80% y el 96%, y para la aplicación Memcached del 92% al 99%, al tiempo que reduce el consumo de energia hasta el 18% en AR

    Inception: we need to go wider

    No full text
    corecore